Using Conditional Random Fields to Predict Pitch Accents in Conversational Speech

نویسندگان

  • Michelle L. Gregory
  • Yasemin Altun
چکیده

The detection of prosodic characteristics is an important aspect of both speech synthesis and speech recognition. Correct placement of pitch accents aids in more natural sounding speech, while automatic detection of accents can contribute to better wordlevel recognition and better textual understanding. In this paper we investigate probabilistic, contextual, and phonological factors that influence pitch accent placement in natural, conversational speech in a sequence labeling setting. We introduce Conditional Random Fields (CRFs) to pitch accent prediction task in order to incorporate these factors efficiently in a sequence model. We demonstrate the usefulness and the incremental effect of these factors in a sequence model by performing experiments on hand labeled data from the Switchboard Corpus. Our model outperforms the baseline and previous models of pitch accent prediction on the Switchboard Corpus.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting Prominence in Conversational Speech: Pitch Accent, Givenness and Focus

The variability and reduction that are characteristic of talking in natural interaction make it very difficult to detect prominence in conversational speech. In this paper, we present analytic studies and automatic detection results for pitch accent, as well as on the realization of information structure phenomena like givenness and focus. For pitch accent, our conditional random field model co...

متن کامل

Better Punctuation Prediction with Dynamic Conditional Random Fields

This paper focuses on the task of inserting punctuation symbols into transcribed conversational speech texts, without relying on prosodic cues. We investigate limitations associated with previous methods, and propose a novel approach based on dynamic conditional random fields. Different from previous work, our proposed approach is designed to jointly perform both sentence boundary and sentence ...

متن کامل

Discriminative training and unsupervised adaptation for labeling prosodic events with limited training data

Many applications of spoken-language systems can benefit from having access to annotations of prosodic events. Unfortunately, obtaining human annotations of these events, even sensible amounts to train a supervised system, can become a laborious and costly effort. In this paper we explore applying conditional random fields to automatically label major and minor break indices and pitch accents f...

متن کامل

The perception of phrasal prominence in English, Spanish and French conversational speech

Since Bolinger’s [1] discovery that pitch cues accentual prominence in English, a tension has arisen between two strategies: equating accent with pitch excursions and relying on perception for identifying accented words. This paper investigates the relation between prominence judgments from untrained listeners and accentual labels produced by trained transcribers. Naïve speakers of English, Spa...

متن کامل

Prediction of speech recognition accuracy for utterance classification

The paper deals with the problem of predicting speech recognition quality and filtering poorly recognized utterances in the case when no reference transcripts are available. In the proposed system, word error rate (WER) predictions for individual utterances are made using conditional random fields (CRF), and classification based on a given threshold is performed afterwards. We propose using a b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004